Profiling a hardware system generated by compiling a high level language onto a programmable logic device

ABSTRACT

A method of profiling a hardware system can include compiling a high level language program into an assembly language representation of the hardware system and translating instructions of the assembly language representation of the hardware system into a plurality of executable, software models. The models can be implemented using a high level modeling language for use with cycle accurate emulation. The method also can include instrumenting at least one of the plurality of models with code that, when executed, provides operating state information relating to the model as output and indicating expected behavior of the circuit by executing the models in an emulation environment.

A portion of the disclosure of this patent document contains materialwhich is subject to copyright protection. The copyright owner has noobjection to the facsimile reproduction by anyone of the patent documentor the patent disclosure, as it appears in the Patent and TrademarkOffice patent file or records, but otherwise reserves all copyrightrights whatsoever.

FIELD OF THE INVENTION

This invention relates to the field of electronic circuit design and,more particularly, to profiling a hardware system generated from a highlevel language representation of the hardware system.

DESCRIPTION OF THE RELATED ART

Traditionally, designs for hardware systems, such as electroniccircuits, have been specified using a hardware description language(HDL). Examples of HDLs can include, but are not limited to Verilog andVHDL. HDLs allow circuit designers to design and document electronicsystems at various levels of abstraction. Designs for programmable logicdevices (PLDs), such as field programmable gate arrays (FPGAs) andapplication specific integrated circuits (ASICs), can be modeled usingan HDL. The design then can be simulated and tested using appropriatesoftware-based design and/or synthesis tools.

High level programming languages (HLLs) also can be used to designelectronic circuits. An HLL, such as Fortran, C/C++, JAVA, or the like,can be used to write a programmatic implementation of an algorithm,which can be translated into a circuit design. This approach allows adeveloper to concentrate on an algorithmic solution to a problem ratherthan the specific hardware involved. A variety of different tools areavailable which can translate the HLL program into a synthesizablenetlist or other software-based circuit representation.

One advantage of using an HLL is that many of the complexities ofcircuit design, particularly with regard to programming an FPGA, can bereduced. Many of the design decisions can be made by the implementationtools based upon an analysis of the HLL program. Still, the hardwarethat ultimately is generated from the HLL program, though functional,may be inefficient and, therefore, require refinement. In such cases, itbecomes necessary to identify those portions of the circuit design thatdo require further refining or optimization.

It would be beneficial to provide a technique for profiling a hardwaresystem created from an HLL program to identify potential shortcomings orbugs.

SUMMARY

One embodiment of the present invention relates to a method of profilinga hardware system. The method can include compiling a high levellanguage program into an assembly language representation of thehardware system. Instructions of the assembly language representation ofthe hardware system can be translated into a plurality ofcycle-accurate, software-emulation models. One or more of the pluralityof models can be instrumented with code that, when executed, providesoperating state information relating to the model as output. Anindication of expected behavior of the hardware system can be providedby executing the models in an emulation environment.

The method also can include identifying at least one model that is in aread state for at least two consecutive cycles of an emulation of thehardware system. An indication that a component of the hardware systemthat is represented by the identified model can be provided whichspecifies that the component can be shared. Compiling the high levellanguage program can include identifying constructs of the high levellanguage program and mapping the constructs to instructions andpseudo-instructions of the assembly language.

Translating instructions of the assembly language representation caninclude creating models of hardware components for the hardware systemaccording to instructions of the assembly language. Models offirst-in-first-outs that couple the models of the hardware componentsfor the hardware system according to operands of the instructions of theassembly language representation can be created.

Indicating expected behavior can include indicating an operating stateof at least one instrumented model during emulation. In another aspect,indicating expected behavior can include storing operational stateinformation, generated during emulation, for at least one of theplurality of instrumented models.

Another embodiment of the present invention can include a method ofprofiling a hardware system including compiling a high level languageprogram into an assembly language representation of the hardware systemand translating instructions of the assembly language representation ofthe hardware system into a plurality of hardware description language(HDL) models. One or more of the plurality of HDL models can beinstrumented with code that, when implemented within a programmablelogic device, instantiates hardware structure that provides operatingstate information for a component corresponding to the instrumented HDLmodel. The method further can include indicating expected behavior ofthe hardware system by configuring the programmable logic device usingthe instrumented HDL models and running a simulation with theprogrammable logic device.

The method further can include identifying at least one componentinstantiated by an instrumented HDL model that is in a read state for atleast two consecutive cycles of a simulation. An indication that thecomponent of the hardware system can be provided that specifies that thecomponent can be shared. Compiling the high level language program caninclude identifying constructs of the high level language program andmapping the constructs to instructions and pseudo-instructions of theassembly language.

Translating instructions of the assembly language representation caninclude creating HDL models of hardware components for the hardwaresystem according to instructions of the assembly language. Translatinginstructions of the assembly language representation also can includecreating HDL models of first-in-first-outs that link the HDL models ofhardware components for the hardware system according to operands of theinstructions of the assembly language representation.

Indicating expected behavior can include, within a host computingsystem, receiving operating state information during simulation for atleast one component instantiated within the programmable logic devicethat corresponds to an instrumented HDL model. In another aspect,indicating expected behavior can include storing the operating stateinformation within the host computing system.

Yet another embodiment of the present invention can include a computerprogram product having computer-usable code that, when executed by aninformation processing system, implements the various steps and/orfunctions disclosed herein.

BRIEF DESCRIPTION OF THE DRAWINGS

Presently preferred embodiments are shown in the drawings. It should beappreciated, however, that the invention is not limited to the precisearrangements and instrumentalities shown.

FIG. 1 is a block diagram illustrating a system for profiling a circuitdesign in accordance with one embodiment of the present invention.

FIG. 2 is graphical representation of a netlist which has been generatedin accordance with the embodiments disclosed herein.

FIG. 3 illustrates an example of a high level language (HLL) “if”construct.

FIG. 4 illustrates an assembly language translation of the HLL “if”construct shown in FIG. 3 in accordance with the inventive arrangementsdisclosed herein.

FIG. 5 is a schematic diagram illustrating a circuit design generatedfrom the assembly language translation of FIG. 4 in accordance with theinventive arrangements disclosed herein.

FIG. 6 illustrates an example of an HLL “for” construct.

FIG. 7 illustrates an assembly language translation of the HLL “for”construct shown in FIG. 6 in accordance with the inventive arrangementsdisclosed herein.

FIG. 8 is a schematic diagram illustrating a circuit design generatedfrom the assembly language translation of FIG. 7 in accordance with theinventive arrangements disclosed herein.

FIG. 9 illustrates another example of an HLL “for” construct.

FIG. 10 illustrates an assembly language translation of the HLL “for”construct shown in FIG. 9 in accordance with the inventive arrangementsdisclosed herein.

FIG. 11 is a schematic diagram illustrating a circuit design generatedfrom the assembly language translation of FIG. 10 in accordance with theinventive arrangements disclosed herein.

FIG. 12 illustrates an example of an HLL finite impulse response (FIR)filter implementation.

FIG. 13 illustrates an assembly language translation of the HLL FIRimplementation of FIG. 12 in accordance with the inventive arrangementsdisclosed herein.

FIG. 14 is a listing of operating state information for a circuit designin accordance with another embodiment of the present invention.

FIG. 15 is a listing of operating state information for a circuit designin accordance with another embodiment of the present invention.

FIG. 16 is a block diagram illustrating a system in accordance withanother embodiment of the present invention.

FIG. 17 is a view of operating state information presented in agraphical format in accordance with another embodiment of the presentinvention.

FIG. 18 is a flow chart illustrating a method of profiling a hardwaresystem in accordance with another embodiment of the present invention.

DETAILED DESCRIPTION OF THE INVENTION

While the specification concludes with claims defining the features ofthe invention that are regarded as novel, it is believed that theinvention will be better understood from a consideration of thedescription in conjunction with the drawings. As required, detailedembodiments of the present invention are disclosed herein; however, itis to be understood that the disclosed embodiments are merely exemplaryof the invention, which can be embodied in various forms. Therefore,specific structural and functional details disclosed herein are not tobe interpreted as limiting, but merely as a basis for the claims and asa representative basis for teaching one skilled in the art to variouslyemploy the inventive arrangements in virtually any appropriatelydetailed structure. Further, the terms and phrases used herein are notintended to be limiting but rather to provide an understandabledescription of the invention.

The embodiments disclosed herein relate to profiling a hardware system.More particularly, a circuit design, originally developed from a highlevel language (HLL) program, can be profiled to determine whether oneor more portions of the circuit design require further refinement. Inaccordance with the embodiments disclosed herein, an HLL programimplementation of an algorithm can be compiled into an assembly languageprogram. From the assembly language program, a netlist can be generated,from which a plurality of models can be created. The models can beinstrumented with code that reports operating information relating tothe models. This operating information can be reviewed to determineexpected behavior of a circuit design that is generated by translatingthe HLL program. The operating information can be used to locatepotential bugs or inefficiencies in the resulting circuit design.

Development of a circuit design in accordance with the embodimentsdisclosed herein can result in a highly-pipelined circuit design.Relying upon pipelining, as opposed to parallelization, allows circuitdevelopers to think in terms of sequential steps rather than parallelprocesses, thereby achieving a more intuitive design methodology. Theuse of pipelining also can increase computational efficiency as a loopof “N” iterations would require N times the number of cycles (C) tocomplete on a conventional Von Neuman architecture-type machine. Using apipelined machine, such a loop would require N+C cycles to complete.

Within a highly pipelined architecture, performance degradation canresult from a variety of different reasons including, but not limitedto, consumers waiting for data to appear at the input FIFO(s) of theconsumers, producers waiting for space to become available on the outputFIFO(s) of the producers, and memory congestion resulting in a memorythroughput that is lower than needed. Consumers and producers can referto instructions of programs. It is these instructions that aretranslated into hardware on a target device. It is useful to detect suchpotential problems by profiling the circuit design through emulationand/or simulation. Profiling can indicate a need for modifying thecircuit design by, for example, adjusting FIFO depth and/or adjustingthe number of memory requests served per clock cycle.

FIG. 1 is a block diagram illustrating a system 100 for profiling acircuit design in accordance with one embodiment of the presentinvention. As shown, an HLL program 105 can be loaded into an HLLcompiler 110. The HLL program 105 can embody an algorithm or functionthat is to be translated into a circuit design for implementation inhardware. For example, the circuit design ultimately can be implementedusing a target device such as a programmable logic device (PLD) or, moreparticularly, a field programmable gate array (FPGA). In any case, theHLL program 105 can be implemented using any of a variety of differentHLLs, whether Fortran, C/C++, JAVA, or the like.

The HLL compiler 110 can parse the HLL program 105 and produce anintermediate format of the HLL program 105 as output. The compiler canbe configured to parse the type of HLL in which the HLL program 105 iscoded. As such, the compiler 110 can parse the HLL program 105 andresolve each sequential source program instruction of the HLL program105 into its constituent parts. Further, the HLL compiler 110 candetermine whether the HLL program 105 conforms to a defined standardsyntax for the HLL.

The HLL compiler 110 then can operate upon the parsed HLL program 105and translate the parsed HLL program 105 into the intermediate format.In one embodiment, the intermediate format can be an assembly languageprogram 115, which, like the HLL program 105, embodies an algorithm tobe implemented in hardware. The assembly language program 115 can beread by a human being and further, due to the characteristics of theassembly language in which the assembly language program 115 isimplemented, can be edited by a designer to refine the circuit design ifso desired. Thus, in one embodiment, the HLL program 105 can be compiledinto as assembly language referred to as “CHiMPS” assembly instructions.CHiMPS is an acronym that stands for “compiling HLL into massivelypipelined systems”.

CHiMPS assembly language, like conventional assembly languages, utilizesop-code mnemonics and operands. Within the CHiMPS assembly language,instructions and pseudo-instructions are used. The HLL compiler 110 canmap the constructs of the HLL program 105 onto these instructions andpseudo-instructions. Generally, instructions cause some type of hardwareto be generated, while pseudo-instructions provide information to theassembler 120. Instructions correspond to predefined hardware modulesand operands of instructions correspond to FIFOs or registers. In otherwords, the instructions of the assembly language program 115 typicallycorrelate to, and can be converted into, instantiations of predefinedhardware modules. The predefined hardware modules act on the operands ofthe instructions, which correlate to FIFOs that couple the varioushardware modules (instructions).

The assembly language program 115 can be provided to an assembler 120.The assembler 120 can process the assembly language program 115 andtranslate it into a netlist 125. The assembler 120 can be implemented asa single pass assembler. A preprocessor, however, can be included whichcan resolve any include files and define instructions. The netlist 125can be a structural HDL netlist that specifies FIFO's and logic blocks.

The netlist 125, due to the use of CHiMPS, results in a circuit designthat is likely to be highly pipelined. The netlist 125 further willspecify a circuit design having structures corresponding to CHiMPSinstructions with FIFOs coupling such structures. As noted, within suchhighly pipelined architectures, performance degradation can result fromconditions such as data consumers waiting for data to appear at theinput FIFO(s) of the consumers, producers waiting for space to becomeavailable on the output FIFO(s) of the producers, and memory congestionresulting in memory throughput that is lower than the needed bandwidth.It can be useful to detect such conditions by profiling the circuitdesign through emulation and/or simulation. Profiling can indicate bugsor errors which may be corrected by adjusting FIFO depth, adjusting thenumber of memory requests served per clock cycle, etc.

As shown, the netlist 125 can be provided to a generator 130. Thegenerator 130 can be configured to translate the netlist 125 into any ofa variety of different software-based models. In one embodiment, thegenerator 130 can create an emulation testbench 135 from the netlist125. As known, a testbench refers to a circuit description thatspecifies and verifies the behavior of a device under test (DUT), inthis case a circuit design for a PLD. A testbench also can refer to thecode used to create a pre-determined input sequence to the DUT, as wellas the code responsible for observing the response from the DUT.

The emulation testbench 135 can include one or more emulation models(not shown). In general, emulation refers to the process of duplicating,or replicating, the functions of one system with a second system suchthat the second system appears to behave like the first. For example,emulation can refer to the case in which software-based models areexecuted within a software test environment to reflect the behavior ofthe emulated system, in this case the circuit design. Emulation, ingeneral, is considered to take a higher level view of a system thansimulation. Emulation, however, still can provide sufficient informationfor diagnosing problems typically associated with highly pipelinedarchitectures.

In one embodiment of the present invention, the emulation testbench 135can be implemented using SystemC™. As known, SystemC™ provideshardware-oriented constructs in the form of a class library implementedin standard C++. It should be appreciated that while SystemC™ ispresented as one way in which the emulation testbench 135 and emulationmodels can be implemented, other modeling systems, languages, and/ortechniques can be used so long as such models are executable within asoftware-based test environment. Accordingly, the present invention isnot intended to be limited by the particular format or manner in whichthe emulation testbench 135 and/or emulation models are implemented.

In this regard, the translation of the assembly language program 115into a netlist 125 can be an optional step. In some cases, for example,based upon the type of emulation testbench 135 used, it may be moreconvenient to translate the assembly language program 115 into thenetlist 125. Such is the case where standard tools are available forderiving the emulation testbench 135 from the netlist 125. In othercases, for example, where proprietary emulation models are used, theassembly language program 115 can be translated directly into theemulation testbench 135 from the assembly language program 115.

In any event, the emulation testbench 135 can be provided to a softwaretesting environment 140. For example, if the emulation testbench 135 isimplemented using SystemC™, a test platform capable of executingSystemC™ emulation models can be used. The particular software testingenvironment 140 that is used can vary according to the implementation ofthe emulation testbench 135.

In another embodiment, the generator 130 can translate the netlist 125into a simulation testbench 145. Simulation, as compared with emulation,can refer to the case where an attempt is made to precisely model thestate of the device being simulated, in this case the circuit designthat is generated from the HLL program 105. For example, simulation canrefer to the case where the simulation testbench 145 is implementedusing HDL and HDL models, which can be used for simulating a circuitdesign within a software-based synthesis and simulation tool or togenerate the configuration bitstream necessary for programming an actualPLD for testing using a hardware platform, i.e. hardware-based testingenvironment 150. As was the case with the emulation testbench 135, inanother embodiment, the simulation testbench 145 can be derived directlyfrom the assembly language program 115.

The simulation testbench 145 and simulation models can be implementedusing an HDL such as VHDL, Verilog, or the like. While the simulationtestbench 145 can be provided to a software-based simulation tool, inanother embodiment, the simulation testbench 145 is used within thehardware-based testing environment 150. The hardware-based testingenvironment 150 can include a simulator executing within a suitable hostcomputer system. The host computer system can be coupled with a hardwareplatform capable of hosting a target PLD within which the circuit designwill be instantiated. The simulator can process the simulation testbench145 to produce a configuration bitstream which can be loaded into thePLD disposed upon the hardware platform. In this embodiment, the circuitdesign can be simulated by configuring an actual PLD using theconfiguration bitstream generated from the simulation testbench 145.Testing of the PLD upon the hardware platform can be performed incooperation with the host computer system executing the simulator.

If the netlist 125 is used to create the bitstream necessary forprogramming a PLD, once the PLD is programmed with the circuit design,the circuit design can be executed or run. At that time, executionthreads can be identified rather than at the time the HLL program 105 iscompiled. Each time an execution thread is generated, i.e. a new branchof a conditional branch is started or a new iteration and/or repetitionof a loop is started, that execution thread can be identified. Eachexecution thread can be associated with an identifier, referred to as asequence number. The sequence numbers can be used within the circuitdesign to emulate flow control of the HLL program 105. It should beappreciated, however, that other ways of preserving the order defined bythe HLL program 105 can be used and that the embodiments disclosedherein are not limited to one particular technique of preserving suchorder.

Generally, each sequence number corresponds to a particular executionthread and conveys scheduling information for the circuit design,thereby facilitating a pipelined architecture and achieving a degree ofparallelism within the circuit design. The circuit can execute theexecution threads in parallel or in a pipelined fashion. This alleviatesthe need to add additional stages to branches from conditionalstatements so that each branch requires the same amount of time tocomplete.

Within the CHiMPS assembly language, pseudo-instructions provideinformation to the assembler 120 thereby providing context to the set ofinstructions following the pseudo-instruction. Examples ofpseudo-instructions can include, but are not limited to, reg, unreg,call, enter, and exit. Some pseudo-instructions may indirectly causehardware to be generated, such as the reg pseudo-instruction. The regpseudo-instruction can lead to the creation of one or more FIFOs whenthose registers are used. It should be appreciated that the creation ofthe FIFOs is incidental to the instruction that creates the hardware andnot to the pseudo-instruction that declared the hardware.

The syntax of the reg pseudo-instruction is: reg <list of registers>.The reg pseudo-instruction tells the assembler that the named registersin the list of registers will be used in upcoming instructions. The listof registers will be created with a default list of 32 bits unlessotherwise specified. The reg pseudo-instruction instructs the assemblerto create FIFOs to carry the values of each register through theinstructions that follow until the registers are “unreged”.

The unreg pseudo-instruction specifies a listing of registers that willno longer be used. The syntax of the unreg pseudo-instruction is: unreg<list of registers>. As with the reg pseudo-instruction, the operands ofthe pseudo-instruction are the registers within the list of registers.Any FIFOs associated with the listed registers can be trimmed back tothe point of their last use. The name space will be purged of the namedregister(s), thereby freeing the registers for use again if needed.

The call pseudo-instruction will cause the assembler to search for afunction with a matching name. The syntax for this pseudo-instructionis: call <function name>[;[<input registers>] [;<output registers>]].The assembler will replace the call pseudo-instruction with the entirecontents of the function. Thus, rather than including a reference to thecalled function, a full copy of the function can be made. The inputregisters specify the inputs to the function and the output registerswill contain the outputs of the function. Default widths for theregisters can be assumed unless otherwise stated within the callpseudo-instruction.

The enter pseudo-instruction defines a function that can be called. Thesyntax for the enter pseudo-instruction is: enter <functionname>[;<input registers>]. The input registers specified serve asplaceholders for registers that will be passed into the specifiedfunction from the calling function. The exit pseudo-instruction signalsthe end of the function defined by the enter pseudo-instruction. Codebetween the enter and exit pseudo-instructions will be copied whereverthe call is made. The output registers specified will define which FIFOsare to be mapped to the output registers specified on the callstatement.

Instructions, as noted, typically cause hardware to be instantiated.Generally, one instruction causes one instantiation of a hardwarecomponent. Instructions are composed largely of operational instructionsand flow-control instructions. Operational instructions wait for allarguments to appear on the input FIFOs. When those arguments areavailable, the operational instruction performs the specified functionand one or more output FIFOs are provided with the result. Bycomparison, flow-control instructions generally split or merge thepipeline based on information either from prior instructions or by usingthe sequence numbers corresponding to the execution threads of the HLLprogram.

Operational instructions can include, but are not limited to, integerarithmetic instructions, logical operation instructions, andfloating-point arithmetic instructions. In illustration, integerarithmetic functions can include, addition, subtraction, multiplication,and division, which typically take one cycle to operate. Still, morecomplex instructions, such as divide, can require additional cycles.Logical operation instructions can include shift operations such aslogical shift right or left and arithmetic shift right or left. Theseoperations can be handled within registers and typically require nocycles to complete.

Flow control instructions can include conditional branching and loopinginstructions. Examples of conditional branching instructions can includethe demux, branch, unbranch, and mux instructions. The syntax of thedemux instruction is:demux<muxid>[:n]<condition>;<branch0id>;<branch1id>;<list of registers>.The demux instruction examines the <condition>. Depending upon the valueof <condition>, the instruction de-multiplexes the specified registersalong with the sequence number corresponding to the code associated witheither branch instruction specified by <branch0id> or <branch1id>.

If the value of the <condition> operand is zero, then the code labeled<branch0> receives the registers, otherwise <branch1> receives theregisters. The <muxid> operand is a unique identifier used to match aparticular demux instruction up with both the branches and thecorresponding mux instruction. The number “n” indicates the depth of theFIFO that needs to be allocated for registers, including the sequencenumber that are not passed into the conditional. Generally, the numbershould be equal to the number of instructions in the longer of the twobranches. If not specified, a default depth can be used.

The branch instruction initiates code belonging to a specific branch asspecified by the <branchid> operand of the demux instruction. The syntaxof the instruction is: branch <branchid>;<list of registers>. Branch isactually a pseudo-instruction as no hardware is generated. A newregister namespace is created containing only those registers in the<list of registers>, which must exactly match the list specified by thedemux instruction.

The unbranch instruction, like the branch instruction, is apseudo-instruction. The unbranch instruction indicates the end of aparticular branch. The register namespace is restored to its value priorto the most recent branch instruction with a matching <branchid>. Thesyntax for the unbranch pseudo-instruction is: unbranch <branchid>;<listof registers>.

The mux instruction multiplexes the list of registers back togetherdepending upon the sequence number passed by the demux instructionhaving a matching <muxid>. The syntax for the mux instruction is:mux<branchid>;<list of registers>.

FIG. 2 is graphical representation of a netlist 200 which has beengenerated in accordance with the embodiments disclosed herein. Theinstructions 205, 210, 215, 220, and 225 of the assembly languagecorrespond to hardware modules in the resulting circuit design. Thearguments of the assembly language instructions 205-225 are depicted asregisters, or FIFOs, communicatively linking the various instructions205-225. In general, a one to one correspondence can persist between theassembly language instructions, the models, as well as the hardwaremodules or components that ultimately are instantiated in a PLD.

State information as well as intermediate results produced by theinstructions 205-225 flow through the FIFOs. Array structures can beimplemented as memory blocks, such as memory 230. As shown,instructions, i.e. instructions 210 and 225, can access the memory 230via FIFOs. An instruction can operate whenever data is available on theinput FIFO(s) to that instruction and then output results onto theoutput FIFO(s) for the instruction.

As shown in FIG. 2, the hardware generated using CHIMPS instructionseffectively forms a system of data producers and data consumers. Theinstructions, i.e. the op-codes, correspond to hardware modules whichcan be viewed as data consumers and/or data producers. The operands ofthe instructions correspond to FIFOs that link the data producers andconsumers. Within this framework, each instruction of the assemblylanguage program can be viewed as a state machine having three states: aread (R) state, a write state (W), and an execute (E) state.

A given instruction, and thus model, can be said to be in a read statewhile waiting for data to appear on the input FIFO(s) of thatinstruction. When data is available, the instruction can enter theexecute state. Upon completion of execution, the instruction can switchto the write state. Once data is written to the output FIFO(s) of theinstruction, the instruction can switch back to the read state. Ingeneral, if input data to an instruction is available and the outputFIFO(s) are not full, the instruction can have a one-cycle latency inthat the instruction can move through the read, execute, and writestates within that clock cycle.

FIG. 3 illustrates an example of an HLL “if” construct. A constructrefers to a data structure used for a particular purpose. A constructcan refer to a single programming language statement or a collection ofmore than one statement such as a loop, method, function, or the like,where the collection has a particular function or purpose. Constructsalso are defined by organizations such as the Institute of Electricaland Electronics Engineers (IEEE) and the American National StandardsInstitute (ANSI). These organizations set forth standards forprogramming languages such as C, C++, Verilog, and VHDL, with eachstandard defining the available constructs for a given language.

In any case, the “if” construct illustrated in FIG. 3 can beincorporated into a larger HLL programmatic representation of a hardwaredesign. When provided to a compiler as described herein, the constructsof the HLL program can be identified and an assembly languagerepresentation in the CHiMPS assembly language can be generated. FIG. 4illustrates the CHiMPS assembly language translation or representationof the HLL “if” construct. The CHiMPS code shown in FIG. 4 illustratesthe conditional branching instructions demux, branch, unbranch, and muxdescribed above.

From the assembly language representation shown in FIG. 4, the compilergenerates a netlist. The netlist specifies the pipelined hardwareconfiguration depicted in FIG. 5. As shown, the instructions of theassembly language representation have been transformed into hardwareinstantiations and the operands have become FIFOs linking the hardwareinstantiations. In this case, the add instruction corresponds with thecompare hardware module.

Generally, a loop in the CHIMPS assembly language includes four stages.The stages include initialization, test for exit condition, body, anditerator. The initialization can be performed prior to the start of theloop. The three other stages can execute iteratively and should occur inparallel. The following list specifies the general structure of a loopwith the stages being shown in bold: initialization code, begininstruction, test for exit, loop instruction, loop body, iteratorinstruction, iterator code, iterate instruction, end instruction.

It should be appreciated that while the iterator code is shown after theloop body, it executes in parallel with the loop body. Accordingly, anymodification of the loop variables which occurs in the iterator stagewill not be visible to the loop body stage until the next iteration.Further, the loop body cannot affect the iterator variables. Anyinstructions that do so need to be moved or replicated within theiterator code.

Looping instructions within the CHiMPS assembly language can include thebegin, loop, end, iterator, and iterate instructions alluded to above.The syntax for the begin instruction is: begin <loopid>;<loopregisters>;<iterator registers>. The begin instruction creates a newregister called the end FIFO which is pushed to the corresponding endinstruction. If the <iterator registers> receive values, these valuesreplace the copies stored in the internal memory and a new set ofoutputs is sent downstream. If the end FIFO becomes empty, the looprestarts. The begin instruction further generates a local sequencenumber that is passed to the instructions inside the loop and thenincremented each time values are passed downstream.

The loop instruction de-multiplexes the loop registers either to thebody or to the end instruction. If the condition is nonzero, the loopregisters are de-multiplexed to the body. Additionally, the iteratorregisters are copied to the iterator instruction. If the condition iszero, the loop registers are de-multiplexed to the end instruction. Thesyntax for the loop instruction is: loop <loopid>;<condition>;<loopregisters>;<iterator registers>.

The iterator instruction identifies the start of the iterator code forthe identified loop as well as the registers that will be passed intothe iterator code from the loop instruction. The syntax for the iteratorinstruction is: iterator <loopid>;<iterator registers>. The iterateinstruction identifies the end of the iterator code for the loop soidentified along with the registers that will be passed back to thebegin statement for the next iteration. The syntax for the iterateinstruction is: iterate <loopid>;<iterator registers>.

The end instruction identifies the end of the loop code along with theresult registers that will be passed on at the end of the loop. Thesyntax of the end instruction is: end <loopid>;<result-regs>. The endinstruction receives registers from the loop body and end of the loopfrom the loop instruction. With the registers from the loop instruction,the end instruction pulls the sequence number from the end FIFO. Duringexecution of the loop, the end instruction continually discards theresult registers from the loop, saving the last set. Once the loop ends,the end instruction waits until the result FIFOs have a sequence numberwhich is exactly equal to the sequence number being passed from the endinstruction minus one. This set of resulting values is passed on withthe others being discarded.

FIG. 6 illustrates an example of an HLL “for” construct. The HLLlanguage construct can be included in a larger HLL program. The compilercan generate a CHiMPS assembly language representation or translation ofthe “for” construct which is illustrated in FIG. 7. The example shown inFIG. 7 illustrates the looping capability and architecture of the CHiMPSassembly language as described with reference to the begin, loop,iterator, iterate, and end instructions.

The circuitry resulting from the processing of the assembly language ofFIG. 7 is illustrated in FIG. 8. As noted, each instruction has beeninstantiated as a hardware module. The operands of the assembly languageinstructions have become FIFOs linking the various hardwareinstantiations. Further, the sequence numbers which correspond to theexecution threads of the HLL program, now are used for controllingsignal flow through the resulting hardware configuration. As noted, thisalleviates the need for including additional stages in the hardwaredesign to ensure that each leg of a conditional branch matches the otherin terms of timing.

FIG. 9 illustrates another example of an HLL “for” construct. Theassembly language representation of the HLL “for” construct isillustrated in FIG. 10. In this case, the assembly languagerepresentation includes two additional instructions. The wait and syncinstructions are included and implement aspects of the loopingcapability of the CHIMPS assembly language. At times, a value is alteredinside of a loop and the resulting value is to be used in the nextiteration of the loop. The wait and sync instructions implement thisfunctionality.

As noted, the wait instruction is used for synchronizing modified dataand data used inside a loop. The syntax of the wait instruction is: wait<waitid>;<wait registers>. The wait instruction receives two sets ofregisters. One set of registers, denoted as “A”, flows from the priorinstruction. The other set of registers, denoted as “B”, flows from thesync instruction that corresponds with the wait instruction. The waitinstruction stores an expected sequence number denoted as “E”, thatstarts at −1 and is used to select which registers are to be passedthrough.

The sequence number E is evaluated against the sequence numbercorresponding to register set A. The registers can be passed through inone of two different ways. If the sequence number in set A is greaterthan that stored in E, then the A set of registers is passed on. In thatcase the expected sequence number E is set to the one in set A plus 1.If the sequence number in set A is equal to that stored in E, the B setof registers is pulled off and discarded until a set with a sequencenumber equal to the sequence number stored in E minus 1 is found. Whenfound, those registers are passed on and the sequence number stored in Eis incremented.

The sync instruction is matched with a wait instruction, and as such,the arguments must be identical. The syntax for the sync instruction is:sync <loopid>, <list of registers>. The sync instruction duplicatesregisters in the register list and sends one set of registers on and theother set of registers back to the B side of the wait instruction. Thewait and sync instructions must be located before and after the firstand last references of variables modified during a loop. The circuitryresulting from the assembly language shown in FIG. 10 is depicted inFIG. 11.

FIG. 12 illustrates an example of an HLL finite impulse response (FIR)filter implementation. The assembly language representation of the HLLFIR filter is illustrated in FIG. 13.

FIG. 14 is a listing of operating state information for a circuit designin accordance with another embodiment of the present invention. As themodels are created, the generator can instrument the models withappropriate code which causes each model to output or report itsoperating state at a given time, or cycle, during emulation orsimulation, as the case may be. In the embodiment in which emulationmodels are created, the emulation models can be instrumented with codewhich causes one or more, or each, model to output its operating statewhile executing during emulation. In the embodiment in which HDL modelsare created, the HDL models can be instrumented with HDL which causesthe necessary hardware within the target PLD for reporting suchoperating state information for various hardware modules to beinstantiated. The state information can be provided to the host computersystem through a communication link between the hardware platform andthe host computing system.

In any case, the operating state information of FIG. 14 is an example ofthe sort of output that can be generated, whether in an emulationenvironment or a simulation environment. The operating state informationincludes data for a plurality of different models. As shown, theoperating state information indicates the particular time during thesimulation or emulation in which the data was generated as well as theparticular component or model responsible for generating that operatingstate information. In this case, since each model has a single clockcycle operation latency, the execution state is not shown. The “R”indicates the model was in the read state while a “W” is indicative ofthe write state.

FIG. 15 is a listing of operating state information for a circuit designin accordance with another embodiment of the present invention. Theoperating state information shown in FIG. 15 illustrates a case in whichfine resolution data with respect to the operation state of a particularmodel is desired. As shown, the operating state of a particular model,i.e. the “12_add_(—)1” model is shown throughout the course of a givensimulation or emulation as the case may be.

The state information that is generated by the various models can beused for a variety of diagnostic and performance enhancing functions. Inone embodiment, if a particular model is in a read state for at leasttwo consecutive cycles of an emulation of the circuit design, thatcondition can be indicative that the component represented by the modelis waiting for data. As such, the component is a candidate for sharing,e.g., to be used for another function during the time in which thecomponent waits. A similar conclusion can be drawn from a hardwarecomponent instantiated within a PLD as a result of an instrumented HDLmodel. In one embodiment, the operating state information, whetherevaluated as generated, or after being generated, can be parsed suchthat any component that is a candidate for sharing can be identified orflagged along with the relevant cycles in which such component can beshared.

FIGS. 14 and 15 have been provided for purposes of illustration only. Itshould be appreciated that the manner in which operating stateinformation is provided can vary significantly. In one embodiment of thepresent invention, operating state information is written to a textfile. The operating state information can be viewed in real time as itis generated and/or stored for later use, whether in a flat file, adatabase, or the like. If desired, graphical user interface (GUI) basedprograms can be written to visualize the operating state information ina user selected manner. This visualization can be performed after thefact or can be performed dynamically as the emulation or simulation isexecuting. For example, a graphical display of pending reads, i.e.outstanding memory requests, throughout the timeline of the simulationcan be provided. Such a view can be modified to portray a snapshot of agiven point in time, an average over a portion or the entire simulation,or updated dynamically as the simulation progresses.

FIG. 16 is a block diagram illustrating a system 1600 in accordance withanother embodiment of the present invention. As shown, the system 1600includes a CHiMPS emulator 1605 which can execute one or more emulationmodels for a circuit design and an emulation data store 1610. Theemulation data store 1610 can receive a stream of model stateinformation from the CHiMPS emulator 1605 as a circuit design isemulated. The model state information for a circuit design duringemulation can be presented through a dynamic display 1615, for examplein graphical form. The dynamic display 1615 further can receive modelstate information from prior emulations and compare such data, i.e., asillustrated by adder 1620. Thus, not only can data for a currentemulation be viewed dynamically, but information can be dynamicallycontrasted with emulation data from prior emulations or shownconcurrently.

FIG. 17 is a view of operating state information presented in agraphical format in accordance with another embodiment of the presentinvention. In this case, the view depicts higher level informationrelating to the average number of wait cycles before a read operationcan be executed during a simulation. Once the operating stateinformation is collected, or as it is collected, the data can bemanipulated and/or displayed in virtually any manner desired.

In another embodiment, hierarchy information corresponding to individualinstructions of the HLL program is collected and evaluated. For example,correlation data describing which portions of HLL code correspond towhich assembly language program instructions can be maintained.Similarly, correlation data indicating which models correspond to whichassembly language instructions can be maintained. This allows adeveloper to associate particular items of profile information toparticular items of code, whether at the HLL or assembly language level.Thus, a developer can review the performance of a particular set ofinstructions since those instructions can be correlated with theemulation models.

FIG. 18 is a flow chart illustrating a method 1800 of profiling ahardware system in accordance with another embodiment of the presentinvention. The method 1800 can begin in step 1805 where an HLL programimplementation of an algorithm to be implemented in hardware isidentified. In step 1810, the HLL program can be compiled into anassembly language program. As noted, the assembly language program alsocan embody the algorithm to be implemented in hardware, i.e., translatedinto a circuit design. In one embodiment, the assembly language programcan be implemented using the CHiMPS assembly language as describedherein.

In step 1815, a netlist is generated from the assembly language program.If emulation is to be performed with respect to the circuit design, themethod continues to step 1820, where emulation models are created fromthe netlist. In step 1825, the emulation models are instrumented withcode that reports or provides operating state information pertaining tothe model during emulation.

In step 1830, the emulation models is loaded into an emulationenvironment. In step 1835, the emulation of the circuit design can berun. In step 1840, the operational state information for the variousmodels of the circuit design that are executing within the emulationenvironment is collected and/or stored. The information, as discussedherein, can be presented in various formats such that selectedindicators of circuit design behavior can be visualized. This allows adesigner to identify potential problems with the circuit designhardware, particularly in relation to pipelined architectures.

If simulation is to be performed, the method 1800 proceeds from step1815 to step 1845, where HDL models for the circuit design can becreated from the netlist. In step 1850, the HDL models are instrumentedwith code that instantiates operational state reporting structures,i.e., circuit elements, for the various components and/or modules withina target PLD.

In step 1855, a configuration bitstream is generated from the HDL modelsand in step 1860, the target PLD upon the hardware platform can beloaded with the configuration bitstream, thereby instantiating a versionof the circuit design to be simulated. In step 1865, simulation of thecircuit design can begin. As noted, the simulation can include thetarget PLD operating in coordination with a software-based simulatorexecuting within a host computer system. Accordingly, in step 1870, theoperational state of various modules or components within the target PLDas simulation continues are collected and/or stored. This information issent from the hardware platform to the host computer system via acommunication link. The operational state information is presented inany of a variety of different formats to indicate one or more selectedindicators of circuit design behavior.

It should be appreciated that in yet another embodiment, the HDL modelsare used within a purely software-based testing environment in which notarget PLD or hardware platform is used. In that case, the HDL modelsare instrumented with reporting code similar to the manner in which theemulation models are instrumented. The HDL models are executed within asimulation environment, for example a test bench, such that as the HDLmodels execute, one or more of the HDL models can report operating stateinformation to the software simulator.

One or more of the block diagrams and/or flow charts illustrate thearchitecture, functionality, and operation of possible implementationsof systems, methods, and computer program products according to variousembodiments of the present invention. In this regard, each block mayrepresent a module, segment, or portion of computer program, e.g.,computer-usable program code, which can cause an information processingsystem to perform the functions described herein. The computer programs,or computer-usable program code, can be stored or embodied in any of avariety of computer program products. The computer program products caninclude computer-readable or computer-usable media such as opticalmedia, magnetic media, computer memory, or the like.

The present invention can be realized in hardware, software, or acombination of hardware and software. The present invention can berealized in a centralized fashion in one computer system or in adistributed fashion where different elements are spread across severalinterconnected computer systems. Any kind of computer system or otherapparatus adapted for carrying out the methods described herein issuited. A typical combination of hardware and software can be a generalpurpose computer system with a computer program that, when being loadedand executed, controls the computer system such that it carries out themethods described herein. The present invention also can be embedded ina computer program product, which comprises all the features enablingthe implementation of the methods described herein, and which whenloaded in a computer system is able to carry out these methods.

The terms “computer program”, “software”, “application”, variants and/orcombinations thereof, in the present context, mean any expression, inany language, code or notation, of a set of instructions intended tocause a system having an information processing capability to perform aparticular function either directly or after either or both of thefollowing: a) conversion to another language, code or notation; b)reproduction in a different material form. For example, a computerprogram can include, but is not limited to, a subroutine, a function, aprocedure, an object method, an object implementation, an executableapplication, an applet, a servlet, a source code, an object code, ashared library/dynamic load library and/or other sequence ofinstructions designed for execution on a computer system.

The terms “a” and “an”, as used herein, are defined as one or more thanone. The term “plurality”, as used herein, is defined as two or morethan two. The term “another”, as used herein, is defined as at least asecond or more. The terms “including” and/or “having”, as used herein,are defined as comprising, i.e. open language. The term “coupled”, asused herein, is defined as connected, although not necessarily directly,and not necessarily mechanically, i.e. communicatively linked through acommunication channel or pathway or another component or system.

This invention can be embodied in other forms without departing from thespirit or essential attributes thereof. Accordingly, reference should bemade to the following claims, rather than to the foregoingspecification, as indicating the scope of the invention.

1. A method of profiling a hardware system comprising: compiling a highlevel language program into an assembly language representation of ahardware system, wherein the high level language program is specifiedusing a language other than a hardware description language; translatinginstructions of the assembly language representation of the hardwaresystem into a plurality of cycle-accurate, software-emulation models;wherein each software-emulation model corresponds to a respective one ofthe assembly language instructions, and the models are communicativelylinked by FIFOs to form a pipelined architecture; instrumenting at leastone of the plurality of models with code that, when executed, providesoperating state information relating to the model as output; wherein theoperating state information includes read state data, the at least onemodel is in a read state while the at least one model is waiting fordata from another model to appear on an input FIFO to the at least onemodel, and the read state data indicates a number of consecutive cyclesof an emulation the at least one model is in the read state; andindicating expected behavior of the hardware system by executing themodels in an emulation environment on a computer system.
 2. The methodof claim 1, further comprising: identifying at least one model that isin a read state for at least two consecutive cycles of an emulation; andindicating that a component of the circuit represented by the identifiedmodel is available for sharing in performing another function during atleast a portion of the time the at least one model is in the read state.3. The method of claim 1, wherein compiling the high level languageprogram further comprises: identifying constructs of the high levellanguage program; and mapping the constructs to instructions andpseudo-instructions of the assembly language.
 4. The method of claim 1,wherein translating instructions of the assembly language representationfurther comprises creating models of hardware components for the circuitdesign according to instructions of the assembly language.
 5. The methodof claim 4, wherein translating instructions of the assembly languagerepresentation further comprises creating models of first-in-first-outsthat couple the models of hardware components for the hardware systemaccording to operands of the instructions of the assembly languagerepresentation.
 6. The method of claim 1, wherein indicating expectedbehavior further comprises indicating an operating state of at least oneinstrumented model during emulation.
 7. The method of claim 1, whereinindicating expected behavior further comprises storing operational stateinformation, generated during emulation, for at least one of theplurality of instrumented models.
 8. A method of profiling a hardwaresystem comprising: compiling a high level language program into anassembly language representation of a hardware system, wherein the highlevel language program is specified using a language other than ahardware description language; translating instructions of the assemblylanguage representation of the hardware system into a plurality ofhardware description language (HDL) models; wherein each HDL modelcorresponds to a respective one of the assembly language instructions,and the models are communicatively linked by FIFOs to form a pipelinedarchitecture; instrumenting at least one of the plurality of HDL modelswith code that, when implemented within a programmable logic device,instantiates hardware structure that provides operating stateinformation for a component corresponding to the instrumented HDL model;wherein the operating state information includes read state data, the atleast one model is in a read state while the at least one model iswaiting for data from another model to appear on an input FIFO to the atleast one model, and the read state data indicates a number ofconsecutive cycles of a simulation the at least one model is in the readstate; and indicating expected behavior of the hardware system byconfiguring the programmable logic device using the instrumented HDLmodels and running a simulation with the programmable logic device. 9.The method of claim 8, further comprising identifying at least onecomponent instantiated by an instrumented HDL model that is in a readstate for at least two consecutive cycles of a simulation; andindicating that the component of the hardware system is available forsharing in performing another function during at least a portion of thetime the at least one model is in the read state.
 10. The method ofclaim 9, wherein compiling the high level language representationfurther comprises: identifying constructs of the high level languagerepresentation; and mapping the constructs to instructions andpseudo-instructions of the assembly language.
 11. The method of claim 8,wherein translating instructions of the assembly language representationfurther comprises creating HDL models of hardware components for thehardware system according to instructions of the assembly language. 12.The method of claim 11, wherein translating instructions of the assemblylanguage representation further comprises creating HDL models offirst-in-first-outs that link the HDL models of hardware components forthe hardware system according to operands of the instructions of theassembly language representation.
 13. The method of claim 8, whereinindicating expected behavior further comprises, within a host computingsystem, receiving operating state information during simulation for atleast one component that corresponds to an instrumented HDL model thatis instantiated within the programmable logic device.
 14. The method ofclaim 13, further comprising storing the operating state informationwithin the host computing system.
 15. A computer program productcomprising: a computer-usable medium having stored thereoncomputer-usable program code that profiles a hardware system, saidcomputer program product including: computer-usable program code thatcompiles a high level language program into an assembly languagerepresentation of a hardware system, wherein the high level languageprogram is specified using a language other than a hardware descriptionlanguage; computer-usable program code that translates instructions ofan assembly language representation of the hardware system into aplurality of executable, software models; wherein each software modelcorresponds to a respective one of the assembly language instructions,and the models are communicatively linked by FIFOs to form a pipelinedarchitecture; computer-usable program code that instruments at least oneof the plurality of models with code that causes operational informationrelating to the model to be provided as output; wherein the operatingstate information includes read state data, the at least one model is ina read state while the at least one model is waiting for data fromanother model to appear on an input FIFO to the at least one model, andthe read state data indicates a number of consecutive cycles the atleast one model is in the read state; and computer-usable program codethat indicates expected behavior of the circuit by executing the models.16. The computer program product of claim 15, wherein the models areimplemented using a high level modeling language for use with cycleaccurate emulation, wherein the computer-usable program code that thattranslates instructions of the assembly language representation furthercomprises: computer-usable program code that creates models of hardwarecomponents for the circuit design according to instructions of theassembly language; and computer-usable program code that creates modelsof first-in-first-outs that couple the models of hardware components forthe hardware system according to operands of the instructions of theassembly language representation.
 17. The computer program product ofclaim 16, wherein the computer-usable program code that indicatesexpected behavior further comprises computer-usable program code thatthat indicates an operating state of at least one instrumented modelduring emulation.
 18. The computer program product of claim 15, whereinthe models are hardware description language (HDL) models implementedusing an HDL, wherein the computer-usable program code that translatesinstructions of the assembly language representation further comprises:computer-usable program code that creates HDL models of hardwarecomponents for the circuit design according to instructions of theassembly language; and computer-usable program code that creates HDLmodels of first-in-first-outs that couple the HDL models of hardwarecomponents for the circuit design according to operands of theinstructions of the assembly language representation.
 19. The computerprogram product of claim 18, wherein the computer-usable program codethat indicates expected behavior of the hardware system furthercomprises: computer-usable program code that configures a programmablelogic device within a simulation environment with the HDL models; andcomputer-usable program code that runs the programmable logic devicewithin the simulation environment after configuration.
 20. The computerprogram product of claim 19, further comprising computer-usable programcode that, within a host computing system, receives operating stateinformation during simulation for at least one component instantiatedwithin the programmable logic device from an instrumented HDL model.